2021-01-19| Data | File(s) | Format | Source |
|---|---|---|---|
| “Nafot” | nafot.shp (+7) |
Shapefile | https://www.gov.il/he/Departments/Guides/info-gis |
| Railways | RAIL_STRATEGIC.shp (+7) |
Shapefile | https://data.gov.il/dataset/rail_strategic |
| Statistical areas | statisticalareas_demography2018.gdb |
GDB | https://www.cbs.gov.il/he/Pages/geo-layers.aspx |
The data and code for this lecture can be downloaded from:
…
For more details on setting up the environment and sample data, see the preparation document.
Figure 1: QGIS, an example of Graphical User Interface (GUI) software
Figure 2: R, an example of Command Line Interface (CLI) software
R is a programming language and environment, originally developed for statistical computing and graphics. Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of ~16,000 packages in the official repository from various areas of interest.Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are ~185 packages specifically addressing spatial analysis in R, and many more are indirectly related to spatial data.
Figure 3: Books on Spatial Data Analysis with R
Some important events in the history of spatial analysis support in R are summarized in Table 2.
| Year | Event |
|---|---|
| pre-2003 | Variable and incomplete approaches (MASS, spatstat, maptools, geoR, splancs, gstat, …) |
| 2003 | Consensus that a package defining standard data structures should be useful; rgdal released on CRAN |
| 2005 | sp released on CRAN; sp support in rgdal (Section ?? |
| 2008 | Applied Spatial Data Analysis with R, 1st ed. |
| 2010 | raster released on CRAN (Section ??) |
| 2011 | rgeos released on CRAN |
| 2013 | Applied Spatial Data Analysis with R, 2nd ed. |
| 2016 | sf released on CRAN (Section ??) |
| 2018 | stars released on CRAN (Section ??) |
| 2019 | Geocomputation with R (https://geocompr.robinlovelace.net/) |
| 2021(?) | Spatial Data Science (https://www.r-spatial.org/book/) |
The question that arises here is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with graphical user interfaces such as ArcGIS or QGIS.
General advantages of Command Line Interface (CLI) software include:
Moreover, specific strengths of R as a GIS are:
Nevertheless, there are situations when other tools are needed:
mapedit package)The following sections (??–??) highlight some of the capabilities of spatial data analysis packages in R, through short examples.
Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:
sf: Geoprocessing Vector LayersGEOS is used for geometric operations on vector layers with sf:
Figure 4: Buffer function
stars: Geoprocessing RastersGeometric operations on rasters can be done with package raster:
+, -, …), Math (sqrt, log10, …), logical (!, ==, >, …), summary (mean, max, …), Mask, Overlay…gstat: Geostatistical ModellingUnivariate and multivariate geostatistics:
Figure 5: Predicted Zinc concentration, using Ordinary Kriging
spdep: Spatial dependence modelsModelling with spatial weights:
Figure 6: Neighbors list based on regions with contiguous boundaries
spatstat: Spatial point patternsTechniques for statistical analysis of spatial point patterns, such as:
Figure 7: Distance map for the Biological Cells point pattern dataset
RPostgreSQL: Working with PostGISPackage sf combined with RPostgreSQL can be used to read from, and write to, a PostGIS spatial database:
# library(sf)
# library(RPostgreSQL)
# con = dbConnect(
# PostgreSQL(),
# dbname = "gisdb",
# host = "159.89.13.241",
# port = 5432,
# user = "geobgu",
# password = "*******"
# )
# q = "SELECT name_lat, geometry FROM plants LIMIT 3;"
# st_read(con, query = q)
stplanr, dodgr, sfnetworks, igraphrmapshaperosmdata, osmextractggplot2, tmap, rasterVisleaflet, mapview, deckglmapsapigeosphere, s2, lwgeomFigure 8: Geometry (left) and attributes (right) of vector layers (https://www.neonscience.org/dc-shapefile-attributes-r)
Figure 9: Simple Feature types (see also: https://r-spatial.github.io/sf/articles/sf1.html)
| Type | Format | File extension |
|---|---|---|
| Binary | ESRI Shapefile | .shp, .shx, .dbf, .prj, … |
| GeoPackage (GPKG) | .gpkg |
|
| Plain Text | GeoJSON | .json or .geojson |
| GPS Exchange Format (GPX) | .gpx |
|
| Keyhole Markup Language (KML) | .kml |
|
| Spatial Databases | PostGIS / PostgreSQL |
sf data structuresPackage sf defines a hierarchical class system with three classes (Table ??):
sfg is for a single geometrysfc is a set of geometries with a CRSsf is a layer with attributes| class | hierarchy | data |
|---|---|---|
sfg |
geometry | coords, type, dimension |
sfc |
geometry column | set of sfg + CRS |
sf |
layer | sfc + attributes |
sf is a relative new (2016-) R package for handling vector layers in R. In the long-term, sf aims to replace rgdal (2003-), sp (2005-), and rgeos (2011-).
The main innovation in sf is a complete implementation of the Simple Features standard. Since 2003, Simple Features been widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. When working with spatial databases, Simple Features are commonly specified as Well Known Text (WKT). A subset of simple features forms the GeoJSON standard.
It is important to note that the sf package depends on several external software components (installed along with the R package), most importantly GDAL, GEOS and PROJ (Figure 10). These well-tested and popular open-source components are common to numerous open-source and commercial software for spatial analysis, such as QGIS and PostGIS.
Figure 10: sf package dependencies2
The sf class extends the data.frame class to include a geometry column. This is similar to the way that spatial databases are structured. For example, the sample dataset shown in Figure 11 represents a polygonal layer with three features and six non-spatial attributes. The attributes refer to demographic and epidemiological attributes of US counties, such as the number of births in 1974 (BIR74), the number of sudden infant death cases in 1974 (SID74), and so on. The seventh column is the geometry column, containing the polygon geometries.
Figure 11: Structure of an sf object3
Figure 12 shows what layer in Figure 11 would look like when mapped. We can see the outline of the three polygons, as well as the values of all six non-spatial attributes (in separate panels).
Figure 12: Visualization of the sf object shown in Figure 11
Function st_read can be used to read vector layers into sf data structures:
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.3, PROJ 7.1.1
nafot = st_read("data/nafot.shp")
## Reading layer `nafot' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/nafot.shp' using driver `ESRI Shapefile'
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
Printing the object gives a summary of its properties, and the values of the (first 10) features:
nafot
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry
## 1 Zefat POLYGON ((739780.1 3686007,...
## 2 Jerusalem POLYGON ((672273.8 3518394,...
## 3 Kinneret POLYGON ((745560 3649860, 7...
## 4 Yizre'el POLYGON ((702283.1 3628046,...
## 5 Akko POLYGON ((702725.9 3630513,...
## 6 Golan POLYGON ((759304.4 3691202,...
## 7 Haifa POLYGON ((701391.6 3631170,...
## 8 Hadera POLYGON ((706537.1 3602188,...
## 9 Sharon POLYGON ((692687.3 3583974,...
## 10 Ramla POLYGON ((672841.5 3544808,...
As mentioned above, a layer (geometry+attributes) is represented by an sf object:
class(nafot)
## [1] "sf" "data.frame"
If we want just the geometric part, it can be extracted with st_geometry, resulting in an object of class sfg (geometry column):
st_geometry(nafot)
## Geometry set for 15 features
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 5 geometries:
## POLYGON ((739780.1 3686007, 739786.6 3686007, 7...
## POLYGON ((672273.8 3518394, 672368.4 3518511, 6...
## POLYGON ((745560 3649860, 745585.8 3649853, 745...
## POLYGON ((702283.1 3628046, 702283.5 3628046, 7...
## POLYGON ((702725.9 3630513, 702724.1 3630510, 7...
Conversely, If we want just the non-geometric part, it can be extracted with st_drop_geometry, resulting in a data.frame:
st_drop_geometry(nafot)
## name_eng
## 1 Zefat
## 2 Jerusalem
## 3 Kinneret
## 4 Yizre'el
## 5 Akko
## 6 Golan
## 7 Haifa
## 8 Hadera
## 9 Sharon
## 10 Ramla
## 11 Rehovot
## 12 Ashqelon
## 13 Be'er Sheva
## 14 Petah Tiqwa
## 15 Tel Aviv
The plot function is a quick way to see the spatial arrangment and attribute values in an sf layer. For example:
plot(nafot, key.width = lcm(4))
Figure 13: The nafot layer
A Coordinate Reference System (CRS) defines how the coordinates in the data relate to the surface of the Earth. There are two main types of CRS (Figure 14):
Figure 14: US counties in WGS84 and US Atlas projections
A vector layer can be reprojected with st_transform. The st_transform function has two parameters:
x—The layer to be reprojectedcrs—The target CRSWhere a CRS can be specified using:
4326)"+proj=longlat +datum=WGS84 +no_defs")CRS of nafot vector layer:
st_crs(nafot)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 36N
## wkt:
## PROJCRS["WGS 84 / UTM zone 36N",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 36N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",33,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## ID["EPSG",32636]]
nafot_wgs84 = st_transform(nafot, 4326)
nafot_wgs84
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 34.26679 ymin: 29.48737 xmax: 35.89468 ymax: 33.33479
## geographic CRS: WGS 84
## First 10 features:
## name_eng geometry
## 1 Zefat POLYGON ((35.57481 33.2865,...
## 2 Jerusalem POLYGON ((34.81956 31.78814...
## 3 Kinneret POLYGON ((35.6271 32.95949,...
## 4 Yizre'el POLYGON ((35.15965 32.77173...
## 5 Akko POLYGON ((35.16491 32.79389...
## 6 Golan POLYGON ((35.78575 33.32878...
## 7 Haifa POLYGON ((35.15081 32.80006...
## 8 Hadera POLYGON ((35.19932 32.53785...
## 9 Sharon POLYGON ((35.0482 32.37614,...
## 10 Ramla POLYGON ((34.83026 32.02623...
plot(st_geometry(nafot), main = "UTM", axes = TRUE)
plot(st_geometry(nafot_wgs84), main = "WGS84", axes = TRUE)
Figure 15: Nafot in UTM and WGS84 coordinate reference systems
In the following examples, we will use two vector layers:
nafot—“Nafa” administrative areas in Israelrail—Railways in IsraelFirst of all, we read the layers into R, using st_read:
nafot = st_read("data/nafot.shp")
## Reading layer `nafot' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/nafot.shp' using driver `ESRI Shapefile'
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
rail = st_read("data/RAIL_STRATEGIC.shp")
## Reading layer `RAIL_STRATEGIC' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/RAIL_STRATEGIC.shp' using driver `ESRI Shapefile'
## Simple feature collection with 217 features and 6 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS: Israel 1993 / Israeli TM Grid
nafot
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry
## 1 Zefat POLYGON ((739780.1 3686007,...
## 2 Jerusalem POLYGON ((672273.8 3518394,...
## 3 Kinneret POLYGON ((745560 3649860, 7...
## 4 Yizre'el POLYGON ((702283.1 3628046,...
## 5 Akko POLYGON ((702725.9 3630513,...
## 6 Golan POLYGON ((759304.4 3691202,...
## 7 Haifa POLYGON ((701391.6 3631170,...
## 8 Hadera POLYGON ((706537.1 3602188,...
## 9 Sharon POLYGON ((692687.3 3583974,...
## 10 Ramla POLYGON ((672841.5 3544808,...
rail
## Simple feature collection with 217 features and 6 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS: Israel 1993 / Israeli TM Grid
## First 10 features:
## UPGRADE SHAPE_LENG SHAPE_Le_1 SEGMENT ISACTIVE
## 1 שדרוג 2040 14489.401 12379.715 כפר יהושע - נשר_1 פעיל
## 2 שדרוג 2030 2159.344 2274.112 באר יעקב-ראשונים_2 פעיל
## 3 שדרוג 2040 4942.306 4942.306 נתבג - נען_3 לא פעיל
## 4 שדרוג 2040 2829.892 2793.338 לב המפרץ מזרח - נשר_4 פעיל
## 5 שדרוג 2040 1976.491 1960.171 קרית גת - להבים_5 פעיל
## 6 שדרוג 2040 1195.701 1195.701 רמלה - רמלה מזרח_6 פעיל
## 7 שדרוג 2030 4224.799 4224.799 אור עקיבא - עתלית_7 לא פעיל
## 8 חדש 2040 36452.633 36452.633 מרכז ספיר-פארן_8 לא פעיל
## 9 שדרוג 2040 7822.722 7892.334 נען - בית שמש_9 פעיל
## 10 שדרוג 2030 4610.831 4176.157 נתבג - ההגנה_10 פעיל
## UPGRAD geometry
## 1 שדרוג בין 2030 ל-2040 LINESTRING (205530.1 741563...
## 2 שדרוג עד 2030 LINESTRING (181507.6 650706...
## 3 שדרוג בין 2030 ל-2040 LINESTRING (189180.6 645433...
## 4 שדרוג בין 2030 ל-2040 LINESTRING (203482.8 744181...
## 5 שדרוג בין 2030 ל-2040 LINESTRING (178574.1 609392...
## 6 שדרוג בין 2030 ל-2040 LINESTRING (189266.6 647211...
## 7 שדרוג עד 2030 LINESTRING (193268.9 719774...
## 8 פתיחה בין 2030 ל-2040 LINESTRING (219966 509396.8...
## 9 שדרוג בין 2030 ל-2040 LINESTRING (188081.3 642530...
## 10 שדרוג עד 2030 LINESTRING (184248 657573.3...
For any type of spatial analysis, we usually need all input layers to be in the same CTS. For that purpose, we will reproject the rail layer to the CRS of the nafot layer:
rail = st_transform(rail, st_crs(nafot))
We can plot each layer on its own, as shown above, to examine its attributes:
plot(nafot, key.width = lcm(4))
Figure 16: The nafot layer
plot(rail)
Figure 17: The rail layer
We can also plot the two layer geometries together, to examine their arrangement (Figure 18):
plot(st_geometry(nafot), border = "grey")
plot(st_geometry(rail), add = TRUE)
Figure 18: The nafot and rail geometries
Subsetting (filtering) of features in an sf vector layer is done in exactly the same way as filtering rows in a data.frame. For example, the following expression filters the rail layer to keep only those railway lines which are active:
rail = rail[rail$ISACTIVE == "פעיל", ]
rail
## Simple feature collection with 161 features and 6 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## UPGRADE SHAPE_LENG SHAPE_Le_1 SEGMENT ISACTIVE
## 1 שדרוג 2040 14489.401 12379.715 כפר יהושע - נשר_1 פעיל
## 2 שדרוג 2030 2159.344 2274.112 באר יעקב-ראשונים_2 פעיל
## 4 שדרוג 2040 2829.892 2793.338 לב המפרץ מזרח - נשר_4 פעיל
## 5 שדרוג 2040 1976.491 1960.171 קרית גת - להבים_5 פעיל
## 6 שדרוג 2040 1195.701 1195.701 רמלה - רמלה מזרח_6 פעיל
## 9 שדרוג 2040 7822.722 7892.334 נען - בית שמש_9 פעיל
## 10 שדרוג 2030 4610.831 4176.157 נתבג - ההגנה_10 פעיל
## 11 שדרוג 2040 1426.854 1426.854 סוקולוב - נורדאו_11 פעיל
## 13 שדרוג 2030 8758.880 8758.880 נהריה - עכו_13 פעיל
## 14 שדרוג 2030 5102.569 5102.569 חולות-יבנה מערב_14 פעיל
## UPGRAD geometry
## 1 שדרוג בין 2030 ל-2040 LINESTRING (692568.6 362751...
## 2 שדרוג עד 2030 LINESTRING (670422.8 353618...
## 4 שדרוג בין 2030 ל-2040 LINESTRING (690467.1 363008...
## 5 שדרוג בין 2030 ל-2040 LINESTRING (668326.9 349482...
## 6 שדרוג בין 2030 ל-2040 LINESTRING (678250.9 353284...
## 9 שדרוג בין 2030 ל-2040 LINESTRING (677161.1 352814...
## 10 שדרוג עד 2030 LINESTRING (673022.5 354310...
## 11 שדרוג בין 2030 ל-2040 LINESTRING (679299.3 356088...
## 13 שדרוג עד 2030 LINESTRING (696063 3653684,...
## 14 שדרוג עד 2030 LINESTRING (664704.1 353470...
We can also subset the columns (attributes) we need. For example, in the following expressions we create an ID variable segment_id, and keep only that variable:
rail$segment_id = 1:nrow(rail)
rail = rail["segment_id"]
rail
## Simple feature collection with 161 features and 1 field
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## segment_id geometry
## 1 1 LINESTRING (692568.6 362751...
## 2 2 LINESTRING (670422.8 353618...
## 4 3 LINESTRING (690467.1 363008...
## 5 4 LINESTRING (668326.9 349482...
## 6 5 LINESTRING (678250.9 353284...
## 9 6 LINESTRING (677161.1 352814...
## 10 7 LINESTRING (673022.5 354310...
## 11 8 LINESTRING (679299.3 356088...
## 13 9 LINESTRING (696063 3653684,...
## 14 10 LINESTRING (664704.1 353470...
We can also subset feature according to intersection with another layer, using the latter as an index. For example, the following expression creates a subset of nafot, named nafot1, with only those features intersecting the rail layer:
nafot1 = nafot[rail, ]
nafot1
## Simple feature collection with 12 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 742395.3 ymax: 3665564
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry
## 2 Jerusalem POLYGON ((672273.8 3518394,...
## 4 Yizre'el POLYGON ((702283.1 3628046,...
## 5 Akko POLYGON ((702725.9 3630513,...
## 7 Haifa POLYGON ((701391.6 3631170,...
## 8 Hadera POLYGON ((706537.1 3602188,...
## 9 Sharon POLYGON ((692687.3 3583974,...
## 10 Ramla POLYGON ((672841.5 3544808,...
## 11 Rehovot POLYGON ((660883 3525603, 6...
## 12 Ashqelon POLYGON ((660883 3525603, 6...
## 13 Be'er Sheva POLYGON ((639165.2 3481650,...
Figure 19 shows the nafot subset (in grey fill) and the railway lines layer.
plot(st_geometry(nafot), border = "grey50")
plot(st_geometry(nafot1), border = "grey50", col = "grey90", add = TRUE)
plot(st_geometry(rail), add = TRUE)
Figure 19: The nafot and rail geometries
Geometric operations on vector layers can conceptually be divided into three groups according to their output:
There are several functions to calculate numeric geometric properties of vector layers in package sf:
st_lengthst_areast_distancest_bboxst_dimensionFor example, we can calculate the area of each feature in the nafot layer (i.e. each state) using st_area:
nafot$area = st_area(nafot)
nafot$area[1:3]
## Units: [m^2]
## [1] 660340513 653746831 639422163
The result is an object of class units:
class(nafot$area)
## [1] "units"
We can convert measurements to different units with set_units from package units:
library(units)
## udunits system database from /usr/share/xml/udunits
nafot$area = set_units(nafot$area, "km^2")
nafot$area[1:3]
## Units: [km^2]
## [1] 660.3405 653.7468 639.4222
Inspecting the result:
plot(nafot[, "area"])
Figure 20: Calculated area attribute
Given two layers, x and y, the following logical geometric functions check whether each feature in x maintains the specified relation with each feature in y:
st_intersectsst_disjointst_touchesst_crossesst_withinst_containsst_overlapsst_coversst_covered_byst_equalsst_equals_exactWhen specifying sparse=FALSE the functions return a logical matrix. Each element i,j in the matrix is TRUE when f(x[i], y[j]) is TRUE. For example, this creates a matrix of intersection relations between nafot:
int = st_intersects(nafot1, nafot1, sparse = FALSE)
int
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
## [2,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
## [7,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE
## [8,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
## [9,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [12,] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
The following code section visualizes the matrix:
int1 = apply(int, 2, rev)
int1 = t(int1)
image(int1, col = c("lightgrey", "red"), asp = 1, axes = FALSE)
axis(3, at = seq(0, 1, 1/(nrow(int1)-1)), labels = nafot1$name_eng, las = 2, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
axis(2, at = seq(0, 1, 1/(nrow(int1)-1)), labels = rev(nafot1$name_eng), las = 1, pos = -0.046, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
Figure 21: Intersection relations between nafot features
sf provides common geometry-generating functions applicable to individual geometries, such as:
st_centroidst_bufferst_samplest_convex_hullst_voronoiFigure 22: Geometry-generating operations on individual layers
For example, the following expression uses st_centroid to create a layer of “Nafa” centroids:
nafot_ctr = st_centroid(nafot)
They can be plotted as follows, the result is shown in Figure 23:
plot(st_geometry(nafot), border = "grey")
plot(st_geometry(nafot_ctr), col = "red", pch = 3, add = TRUE)
Figure 23: State centroids
Other geometry-generating functions work on pairs of input geometries (Figure 24):
st_intersectionst_differencest_sym_differencest_unionFigure 24: Geometry-generating operations on pairs of layers
For example, to calculate total rail length per “Nafa” we can use st_intersection to ‘split’ the rail layer by “Nafa”:
rail_int = st_intersection(rail, nafot)
rail_int
## Simple feature collection with 182 features and 3 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## segment_id name_eng area geometry
## 9 6 Jerusalem 653.7468 [km^2] LINESTRING (677169.6 352073...
## 22 16 Jerusalem 653.7468 [km^2] LINESTRING (677218.8 352065...
## 34 25 Jerusalem 653.7468 [km^2] LINESTRING (673559.5 351793...
## 106 88 Jerusalem 653.7468 [km^2] LINESTRING (688368.9 351530...
## 195 146 Jerusalem 653.7468 [km^2] MULTILINESTRING ((691812.2 ...
## 196 147 Jerusalem 653.7468 [km^2] LINESTRING (706324.4 351421...
## 1 1 Yizre'el 1231.8174 [km^2] LINESTRING (697854.1 361850...
## 55 44 Yizre'el 1231.8174 [km^2] LINESTRING (715373.6 361169...
## 66 55 Yizre'el 1231.8174 [km^2] LINESTRING (699306 3617893,...
## 153 121 Yizre'el 1231.8174 [km^2] LINESTRING (707133.8 361436...
The result is a new line layer split by “Nafa” borders and including the name_eng attribute:
plot(rail_int[, "name_eng"], lwd = 3, key.width = lcm(4), reset = FALSE)
plot(st_geometry(nafot), border = "lightgrey", add = TRUE)
Figure 25: Intersection result
The resulting layer has mixed LINESTERING and MULTILINESTRING geometries (Why?)
class(rail_int$geometry)
## [1] "sfc_GEOMETRY" "sfc"
To calculate line length, we need to convert it to MULTILINESTRING:
rail_int = st_cast(rail_int, "MULTILINESTRING")
rail_int
## Simple feature collection with 182 features and 3 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## segment_id name_eng area geometry
## 9 6 Jerusalem 653.7468 [km^2] MULTILINESTRING ((677169.6 ...
## 22 16 Jerusalem 653.7468 [km^2] MULTILINESTRING ((677218.8 ...
## 34 25 Jerusalem 653.7468 [km^2] MULTILINESTRING ((673559.5 ...
## 106 88 Jerusalem 653.7468 [km^2] MULTILINESTRING ((688368.9 ...
## 195 146 Jerusalem 653.7468 [km^2] MULTILINESTRING ((691812.2 ...
## 196 147 Jerusalem 653.7468 [km^2] MULTILINESTRING ((706324.4 ...
## 1 1 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ...
## 55 44 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((715373.6 ...
## 66 55 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((699306 36...
## 153 121 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((707133.8 ...
Let’s add a railway length attribute called length:
rail_int$length = st_length(rail_int)
rail_int$length = set_units(rail_int$length, km)
rail_int
## Simple feature collection with 182 features and 4 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## segment_id name_eng area geometry
## 9 6 Jerusalem 653.7468 [km^2] MULTILINESTRING ((677169.6 ...
## 22 16 Jerusalem 653.7468 [km^2] MULTILINESTRING ((677218.8 ...
## 34 25 Jerusalem 653.7468 [km^2] MULTILINESTRING ((673559.5 ...
## 106 88 Jerusalem 653.7468 [km^2] MULTILINESTRING ((688368.9 ...
## 195 146 Jerusalem 653.7468 [km^2] MULTILINESTRING ((691812.2 ...
## 196 147 Jerusalem 653.7468 [km^2] MULTILINESTRING ((706324.4 ...
## 1 1 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ...
## 55 44 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((715373.6 ...
## 66 55 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((699306 36...
## 153 121 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((707133.8 ...
## length
## 9 0.09790983 [km]
## 22 13.29636317 [km]
## 34 1.25629623 [km]
## 106 30.09022166 [km]
## 195 14.31484517 [km]
## 196 1.18204492 [km]
## 1 1.57447932 [km]
## 55 15.12395169 [km]
## 66 8.95013104 [km]
## 153 8.96550993 [km]
Next we aggregate attribute table of rail_int by state, to find the sum of length values:
rail_int = st_drop_geometry(rail_int)
rail_int = aggregate(rail_int["length"], rail_int["name_eng"], sum)
The result is a data.frame with total length of railway tracks, per “Nafa”:
head(rail_int)
## name_eng length
## 1 Akko 36.19126 [km]
## 2 Ashqelon 76.78070 [km]
## 3 Be'er Sheva 119.74786 [km]
## 4 Hadera 46.31090 [km]
## 5 Haifa 41.60683 [km]
## 6 Jerusalem 60.23768 [km]
Next, we can join the aggregated table back to the nafot layer:
nafot = merge(nafot, rail_int, by = "name_eng", all.x = TRUE)
Here is how the nafot layer looks like after the join:
nafot
## Simple feature collection with 15 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area length geometry
## 1 Akko 955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
## 2 Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 3 Be'er Sheva 13182.9382 [km^2] 119.74786 [km] POLYGON ((639165.2 3481650,...
## 4 Golan 1153.9042 [km^2] NA [km] POLYGON ((759304.4 3691202,...
## 5 Hadera 577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 6 Haifa 299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 7 Jerusalem 653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 8 Kinneret 639.4222 [km^2] NA [km] POLYGON ((745560 3649860, 7...
## 9 Petah Tiqwa 283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 10 Ramla 339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
Length of NA implies there are no railways in the polygon. These NA values should therefore be replaced with zero:
nafot$length[is.na(nafot$length)] = 0
plot(nafot[, "length"])
Figure 26: Total railway length per Nafa
Finally, we divide total railway length by state area. This gives us railway density per state:
nafot$density = nafot$length / nafot$area
Sorted:
nafot = nafot[order(nafot$density, decreasing = TRUE), ]
nafot
## Simple feature collection with 15 features and 4 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area length geometry
## 10 Ramla 339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
## 13 Tel Aviv 175.6073 [km^2] 33.82369 [km] POLYGON ((674110.3 3549169,...
## 11 Rehovot 323.6068 [km^2] 47.65151 [km] POLYGON ((660883 3525603, 6...
## 6 Haifa 299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 9 Petah Tiqwa 283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 7 Jerusalem 653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 5 Hadera 577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 12 Sharon 348.4451 [km^2] 24.78527 [km] POLYGON ((692687.3 3583974,...
## 2 Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 1 Akko 955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
## density
## 10 0.24378209 [1/km]
## 13 0.19260989 [1/km]
## 11 0.14725126 [1/km]
## 6 0.13910357 [1/km]
## 9 0.12563022 [1/km]
## 7 0.09214221 [1/km]
## 5 0.08021836 [1/km]
## 12 0.07113107 [1/km]
## 2 0.06042865 [1/km]
## 1 0.03787138 [1/km]
Plotting the layer shows the new density attribute:
plot(nafot[, "density"])
Figure 27: railway density per state
plot(nafot)
Figure 28: Nafot layer with calculated attributes
In the second example, we are going to examine the dissimilarity between “Nafot” in terms of their age structure. The demographic data come at the statistical area level, which we are going to aggregate to the “Nafa” level. First, we read the statistical areas Shapefile:
stat = st_read("data/statisticalareas_demography2018.gdb")
## Reading layer `statisticalareas_demography2018' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/statisticalareas_demography2018.gdb' using driver `OpenFileGDB'
## Simple feature collection with 3194 features and 34 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 130175.2 ymin: 378139.7 xmax: 284068.5 ymax: 804530.3
## projected CRS: Israel 1993 / Israeli TM Grid
The layer has numerous columns:
vars = colnames(stat)
vars
## [1] "SEMEL_YISHUV" "STAT11" "YISHUV_STAT11"
## [4] "SHEM_YISHUV" "SHEM_YISHUV_ENGLISH" "Stat11_Unite"
## [7] "Stat11_Ref" "Main_Function_Code" "Main_Function_Txt"
## [10] "Religion_yishuv_Code" "Religion_yishuv_Txt" "Religion_Stat_Code"
## [13] "Religion_Stat_Txt" "Pop_Total" "Male_Total"
## [16] "Female_Total" "age_0_4" "age_5_9"
## [19] "age_10_14" "age_15_19" "age_20_24"
## [22] "age_25_29" "age_30_34" "age_35_39"
## [25] "age_40_44" "age_45_49" "age_50_54"
## [28] "age_55_59" "age_60_64" "age_65_69"
## [31] "age_70_74" "age_75_up" "SHAPE_Length"
## [34] "SHAPE_Area" "SHAPE"
but, in this case, we are only interested in the population estimates per age group:
vars = vars[grepl("age_", vars)]
vars
## [1] "age_0_4" "age_5_9" "age_10_14" "age_15_19" "age_20_24" "age_25_29"
## [7] "age_30_34" "age_35_39" "age_40_44" "age_45_49" "age_50_54" "age_55_59"
## [13] "age_60_64" "age_65_69" "age_70_74" "age_75_up"
We will retain only the latter the attributes:
stat = stat[vars]
Again, we need to make sure both layers are in the same CRS:
stat = st_transform(stat, st_crs(nafot))
The resulting subset of the stat layer is shown in Figure 29.
plot(stat, max.plot = 16)
Figure 29: Population estimates in the stat layer
Figure 30 shows one of the attributes in stat with the nafot layer on top:
plot(stat["age_10_14"], pal = hcl.colors(12, "Reds", rev = TRUE), border = "black", lwd = 0.07, reset = FALSE)
plot(st_geometry(nafot), border = "grey", add = TRUE)
Figure 30: The age_10_14 attribute in stat, and the nafot layer
One this we may notice in Figure 30 is that many of the statistical areas have NA values, meaning zero (rather than unknown) population size. For all practical purposes these should be replaced with “true” zero:
stat[is.na(stat)] = 0
The modification is demonstrated in Figure 31. Now we are ready to “transfer” the demographic estimates from the statistical area level, to the “Nafa” level.
plot(stat["age_10_14"], pal = hcl.colors(12, "Reds", rev = TRUE), border = "black", lwd = 0.07, reset = FALSE)
plot(st_geometry(nafot), border = "grey", add = TRUE)
Figure 31: The age_10_14 attribute in stat, and the nafot layer
x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Warning in st_interpolate_aw.sf(stat, nafot, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
## Error in CPL_geos_op2(op, x, y): Evaluation error: ParseException: Unknown WKB type 12.
as.data.frame(table(st_geometry_type(stat)))
## Var1 Freq
## 1 GEOMETRY 0
## 2 POINT 0
## 3 LINESTRING 0
## 4 POLYGON 0
## 5 MULTIPOINT 0
## 6 MULTILINESTRING 0
## 7 MULTIPOLYGON 3068
## 8 GEOMETRYCOLLECTION 0
## 9 CIRCULARSTRING 0
## 10 COMPOUNDCURVE 0
## 11 CURVEPOLYGON 0
## 12 MULTICURVE 0
## 13 MULTISURFACE 126
## 14 CURVE 0
## 15 SURFACE 0
## 16 POLYHEDRALSURFACE 0
## 17 TIN 0
## 18 TRIANGLE 0
stat = st_cast(stat, "MULTIPOLYGON")
as.data.frame(table(st_geometry_type(stat)))
## Var1 Freq
## 1 GEOMETRY 0
## 2 POINT 0
## 3 LINESTRING 0
## 4 POLYGON 0
## 5 MULTIPOINT 0
## 6 MULTILINESTRING 0
## 7 MULTIPOLYGON 3194
## 8 GEOMETRYCOLLECTION 0
## 9 CIRCULARSTRING 0
## 10 COMPOUNDCURVE 0
## 11 CURVEPOLYGON 0
## 12 MULTICURVE 0
## 13 MULTISURFACE 0
## 14 CURVE 0
## 15 SURFACE 0
## 16 POLYHEDRALSURFACE 0
## 17 TIN 0
## 18 TRIANGLE 0
x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Warning in st_interpolate_aw.sf(stat, nafot, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
## Error in CPL_geos_op2(op, x, y): Evaluation error: TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 674902.54580551735 3528656.6016591629 at 674902.54580551735 3528656.6016591629.
stat = st_make_valid(stat)
x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Warning in st_interpolate_aw.sf(stat, nafot, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
x$Group.1 = NULL
The result…
plot(x, max.plot = 16)
dat = st_drop_geometry(x)
rownames(dat) = nafot$name_eng
dat[, 1:6]
## age_0_4 age_5_9 age_10_14 age_15_19 age_20_24 age_25_29
## Ramla 34308.157 35480.099 32369.329 29095.881 24620.472 20846.904
## Tel Aviv 127130.244 109620.934 91668.354 82691.547 80979.798 100220.355
## Rehovot 52838.467 51748.292 45701.403 42056.490 37653.274 36262.685
## Haifa 44738.801 41258.817 36659.088 35189.068 37087.742 39010.933
## Petah Tiqwa 72960.980 72360.969 65454.855 56370.752 47684.046 42176.441
## Jerusalem 143014.908 127973.492 113769.180 103933.860 95599.193 83181.747
## Hadera 42729.350 42823.751 40581.731 39749.875 33351.647 30087.665
## Sharon 40842.886 42077.430 40111.430 36793.293 31939.422 28201.258
## Ashqelon 55514.217 50283.571 42664.732 39655.143 39107.549 36808.529
## Akko 56794.113 57174.737 57322.555 59915.198 52538.711 50274.451
## Yizre'el 48496.599 47099.450 45340.007 47387.870 42124.138 39068.472
## Be'er Sheva 81693.567 72531.221 63411.609 60628.745 56235.210 50365.623
## Golan 4541.896 4949.645 4886.759 5050.816 3791.931 3384.257
## Kinneret 10894.496 10116.174 9526.119 9484.625 8823.037 8893.782
## Zefat 12651.048 11674.798 10784.952 10414.150 9279.833 9640.825
dat = sweep(dat, 1, rowSums(dat), "/")
dat[, 1:6]
## age_0_4 age_5_9 age_10_14 age_15_19 age_20_24 age_25_29
## Ramla 0.09756403 0.10089675 0.09205048 0.08274159 0.07001462 0.05928351
## Tel Aviv 0.08968093 0.07732941 0.06466520 0.05833273 0.05712522 0.07069800
## Rehovot 0.08780640 0.08599476 0.07594610 0.06988902 0.06257181 0.06026094
## Haifa 0.07696696 0.07098013 0.06306693 0.06053796 0.06380437 0.06711295
## Petah Tiqwa 0.09512865 0.09434633 0.08534194 0.07349783 0.06217185 0.05499087
## Jerusalem 0.12794951 0.11449258 0.10178457 0.09298532 0.08552864 0.07441926
## Hadera 0.09602274 0.09623488 0.09119654 0.08932717 0.07494887 0.06761395
## Sharon 0.08737008 0.09001098 0.08580536 0.07870729 0.06832401 0.06032742
## Ashqelon 0.10039783 0.09093817 0.07715945 0.07171659 0.07072626 0.06656847
## Akko 0.08797158 0.08856115 0.08879011 0.09280600 0.08138015 0.07787291
## Yizre'el 0.09384794 0.09114426 0.08773948 0.09170239 0.08151631 0.07560315
## Be'er Sheva 0.11874226 0.10542472 0.09216929 0.08812437 0.08173833 0.07320684
## Golan 0.09003600 0.09811900 0.09687238 0.10012454 0.07516912 0.06708762
## Kinneret 0.09559008 0.08876096 0.08358372 0.08321964 0.07741476 0.07803549
## Zefat 0.10276190 0.09483203 0.08760399 0.08459204 0.07537821 0.07831048
d = dist(dat)
hc = hclust(d, "average")
k = 3
groups = cutree(hc, k = k)
plot(hc)
rect.hclust(hc, k = k)
library(reshape2)
dat2 = dat
dat2$group = groups
dat2$name = rownames(dat2)
dat2 = melt(dat2, id.vars = c("group", "name"))
head(dat2)
## group name variable value
## 1 1 Ramla age_0_4 0.09756403
## 2 2 Tel Aviv age_0_4 0.08968093
## 3 1 Rehovot age_0_4 0.08780640
## 4 2 Haifa age_0_4 0.07696696
## 5 1 Petah Tiqwa age_0_4 0.09512865
## 6 3 Jerusalem age_0_4 0.12794951
library(ggplot2)
ggplot(dat2, aes(x = variable, y = value, group = name)) +
geom_line() +
facet_wrap(~ group)
Other:
sf tutorial from useR!2017 conferencesf tutorial from rstudio::conf 2018 conferenceThank you for listening!